An Algorithm for Real-Time Stereo Vision Implementation of Head Pose and Gaze Direction Measurement

نویسندگان

  • Yoshio Matsumoto
  • Alexander Zelinsky
چکیده

To build smart human interfaces, it is necessary for a system to know a user’s intention and point of attention. Since the motion of a person’s head pose and gaze direction are deeply related with his/her intention and attention, detection of such information can be utilized to build natural and intuitive interfaces. In this paper, we describe our real-time stereo face tracking and gaze detection system to measure head pose and gaze direction simultaneously. The key aspect of our system is the use of real-time stereo vision together with a simple algorithm which is suitable for real-time processing. Since the 3D coordinates of the features on a face can be directly measured in our system, we can significantly simplify the algorithm for 3D model fitting to obtain the full 3D pose of the head compared with conventional systems that use monocular camera. Consequently we achieved a non-contact, passive, real-time, robust, accurate and compact measurement system for head pose and gaze direction. 1 Face and Gaze Detection for Visual Human Interfaces Smart human interfaces need to know a user’s intention and attention. For example, the direction of the gaze can be used for controlling the cursor on a monitor, and the motion of the head can be interpreted as a gesture such as “yes” or “no”. Several kinds of commercial products exist to detect a person’s head position and orientation, such as magnetic sensors and link mechanisms. There are also several companies supporting products that perform eye gaze tracking. These products are generally highly accurate and reliable, however all requires either expensive hardware or artificial environments (cameras mounted on a helmet, infrared lighting, marking on the face etc). The discomfort and the restriction of the ∗This research was done at The Australian National University. motion affects the person’s behavior, which therefore makes it difficult to measure his/her natural behavior. To solve this problem, many research results have been reported to visually detect the pose of a head [1, 2, 3, 4, 5, 6, 7]. Recent advances in hardware have allowed vision researchers to develop real-time face tracking systems. However most of these systems use a monocular vision. Recovering the 3D pose from a monocular image stream is known to be a difficult problem, and high accuracy as well as robustness are hard to be achieved. Therefore, some approaches can not compute the full 3D, 6DOF pose of the head, while other methods are not sufficiently accurate as a measurement system. Some researchers have also developed vision systems to passively detect gaze point [8, 9, 10, 11], however, none of which can measure the 3D vector of the gaze line. In order to construct a system which observes a person without giving him/her any discomfort, it should satisfy the following requirements: • non-contact • passive • real-time • robust to occlusions, deformations and lighting fluctuations • compact • accurate • able to detect head pose and a gaze direction simultaneously Our system satisfies all these requirements by utilizing the following techniques: • real-time stereo vision hardware using a field multiplexing device, • image processing board with normalized correlation capability, • 3D facial feature model and model fitting based on virtual springs, • 3D eye model which assumes the eyeball to be a sphere. The details of the hardware are described in Section 2, and the algorithm and implementation for face tracking and gaze detection are described in Section 3. Experimental results that show the real-time performance of the system are presented in Section 4. Finally the conclusions and a discussion of the future work are given in Section 5. 2 Hardware Configuration of Realtime Stereo Vision System The hardware setup of our real-time stereo face tracking system is shown in Figure 1 . We use a NTSC camera pair (SONY EVI-370DG × 2) to capture images of a person’s face. The output video signals from the cameras are multiplexed into one video signal by the “field multiplexing technique”[12]. The multiplexed video stream is then fed into a vision processing board (Hitach IP5000), where the pose of the head and the direction of the gaze are calculated. 2.1 IP5000 Image Processing Board The IP5000 is a PCI half-sized image processing board. It is connected to a NTSC camera source and a video output monitor. It is equipped with 40 frame memories of 512 × 512 pixels. The image processing LSI runs at 73.5[MHz] and provides a wide variety of fast image processing functions performed in hardware. These include binarization, convolution, filtering, labeling, histogram calculation, color extraction and normalized correlation. The main usage of this board in our system is the execution of normalized correlation for feature tracking and stereo matching. 2.2 Field Multiplexing Device The field multiplexing is a technique used to generate a multiplexed video steam from two video streams in the analog phase. A diagram of the device is shown in Figure 2 . The device takes two synchronized video steams as input into a video switching IC, and one of them is selected and output in every odd or even field. Thus frequency of the switching is only 60[Hz], which makes the device easy and cheap to be implemented. A photo of the device is also shown in Figure 2 . The size is less than 5[cm] square using only consumer electronic parts. PC PentiumII 450MHz 64MB Memory Stereo Camera IP5000 Vision Processor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Analysis of Head Pose and Gaze Direction from Stereo Vision

Rhys Newman and Alexander Zelinsky Research School of Information Sciences and Engineering Australian National University Canberra ACT 0200 fnewman,[email protected] Abstract There is a great deal of current interest in human face/head tracking as there are a number of interesting and useful applications. Not least among these is the goal of tracking the head in real{time1. An interesting...

متن کامل

Field Programmable Gate Array–based Implementation of an Improved Algorithm for Objects Distance Measurement (TECHNICAL NOTE)

In this work, the design of a low-cost, field programmable gate array (FPGA)-based digital hardware platform that implements image processing algorithms for real-time distance measurement is presented. Using embedded development kit (EDK) tools from Xilinx, the system is developed on a spartan3 / xc3s400, one of the common and low cost field programmable gate arrays from the Xilinx Spartan fami...

متن کامل

Real-time Head Pose Estimation with Stereo Vision

Head pose estimation is an important task for many applications such as human-computer interaction and human action understanding since a person’s head direction has an important role in representing his/her intention. In this paper, we propose a real-time head pose estimation method with stereo vision, which does not stress users and is easily applied to a lot of users. We use the degree of th...

متن کامل

Rendering Optimizations Guided by Head-Pose Estimates and Their Uncertainty

In virtual environments, head pose and/or eye-gaze estimation can be employed to improve the visual experience of the user by enabling adaptive level of detail during rendering. In this study, we present a real-time system for rendering complex scenes in an immersive virtual environment based on head pose estimation and perceptual level of detail. In our system, the position and orientation of ...

متن کامل

Relative Pose Measurement Algorithm of Non-cooperative Target based on Stereo Vision and RANSAC

The final approach phase of spacecraft rendezvous and docking is extremely important. In order to solve the problem of the real-time acquisition of the relative pose between target and spacecraft in near distance (<2m), this paper established a binocular stereovision model, and proposed a non-cooperative target relative pose measuring method based on stereo vision and RANSAC algorithm. Linear c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000